-
Notifications
You must be signed in to change notification settings - Fork 7.7k
kernel: clarify and assert interrupts aren't disabled upon context switch #93440
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
kernel: clarify and assert interrupts aren't disabled upon context switch #93440
Conversation
The do_swap() routine used when CONFIG_USE_SWITCH=y asserts that caller thread does not hold any spinlock when CONFIG_SPIN_VALIDATE is enabled. However, there is no similar check in place when CONFIG_USE_SWITCH=n. Copy this assertion in the USE_SWITCH=n implementation of z_swap_irqlock(). Signed-off-by: Mathieu Choplain <mathieu.choplain@st.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For clarity: this is useful for architectures like arm32 that have an arch_swap() and miss the check elsewhere. The error being detected is always wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this caught the bug in my case where a thread holding the spinlock tries to sleep (mutex).
For the record, this patch breaks part of the Lines 242 to 251 in 1f69b91
Using QEMU, I verified that the existing assertion (when diff --git a/samples/hello_world/src/main.c b/samples/hello_world/src/main.c
index c550ab461cb..be87d15501c 100644
--- a/samples/hello_world/src/main.c
+++ b/samples/hello_world/src/main.c
@@ -5,10 +5,13 @@
*/
#include <stdio.h>
+#include <zephyr/kernel.h>
int main(void)
{
+ irq_lock();
printf("Hello World! %s\n", CONFIG_BOARD_TARGET);
+ k_msleep(1000);
return 0;
} $ # with upstream
$ west build -b qemu_x86_64 samples/hello_world/ -DCONFIG_USE_SWITCH=y -DCONFIG_ASSERT=y -DCONFIG_SPIN_VALIDATE=y -p
$ west build -t run
-- west build: running target run
[3/4] To exit from QEMU enter: 'CTRL+a, x'[QEMU] CPU: qemu64,+x2apic
qemu-system-x86_64: warning: TCG doesn't support requested feature: CPUID.01H:ECX.x2apic [bit 21]
qemu-system-x86_64: warning: TCG doesn't support requested feature: CPUID.01H:ECX.x2apic [bit 21]
SeaBIOS (version zephyr-v1.0.0-0-g31d4e0e-dirty-20200714_234759-fv-az50-zephyr)
Booting from ROM..
*** Booting Zephyr OS build v4.2.0-334-g124fb897b490 ***
Hello World! qemu_x86_64/atom
ASSERTION FAIL [arch_irq_unlocked(key) || z_smp_current_get()->base.thread_state & (((1UL << (0))) | ((1UL << (3))))] @ WEST_TOPDIR/zephyr/kernel/include/kswap.h:98
Context switching while holding lock!
RAX: 0x0000000000000004 RBX: 0x000000000011fd90 RCX: 0x0000000000000001 RDX: 0x0000000000000000
RSI: 0x0000000000000062 RDI: 0x00000000001090d2 RBP: 0x000000000012bd70 RSP: 0x000000000012bd48
R8: 0x0000000000000000 R9: 0x0000000000000000 R10: 0x0000000000000002 R11: 0x0000000000000000
R12: 0x000000000010b500 R13: 0x0000000000000000 R14: 0x0000000000000000 R15: 0x000000000012be48
RSP: 0x000000000012bd48 RFLAGS: 0x0000000000000002 CS: 0x0018 CR3: 0x0000000000136000
RIP: 0x000000000010046f
call trace:
0: 0x00000000001051d5
1: 0x000000000010524d
2: 0x0000000000100025
3: 0x0000000000102f2d
4: 0x000000000010045a On a related note, the zephyr/include/zephyr/spinlock.h Lines 154 to 181 in 1f69b91
The options I see are:
|
My vote would just be to remove that section of the docs and rip the bandaid off with the warning (which can always be disabled via kconfig anyway). There are precisely zero circumstances where breaking a critical section (!) lock with a context switch (!!) cannot lead to tears. Code that does that is not only breaking its own carefully curated lock in an immensely clever way that it clearly thought carefully about and knows has no unexpected interactions. Because of the recursive behavior of irq_lock() it's also breaking the locks taken by all the callers up the stack, who had no idea that calling into this obnoxiously clever subsystem was a synchronization trap. It just can't work. We never should have documented it (and I didn't even know we had!). |
What do you think of replacing the existing note: Lines 242 to 251 in 47b07e5
with the following?
These statements sound correct from my understanding of the kernel:
We could add a more direct If this is fine for you, I'll add a new commit that edits |
How about the infamous Linux wording |
We already have a runtime message 🙂 Lines 98 to 100 in 47b07e5
What needs updating is the documentation for |
I think we need the explicit wording of "It is a fatal error to hold the interrupt lock upon context switch", since a thread can force a reschedule/context switch at any time by calling k_yield() for example. |
I don't think (Really, my |
I disagree with the way the note is worded, since when I'm reading it, I'm reading the three lines as a strangely phrased paragraph, and in my mind auto-correcting it to make it make sense. Could you split the three notes into three notes, or phrase it as a paragraph?
or
|
Bikeshed aside, I think adding a warning in the docs is fine, but IMHO it should be as short and clear as possible. "Context switching while holding the irq_lock is illegal." is fine, etc... The user will be alerted by the assert if they trip over it. (Also nitpick: it's not a "fatal error" as nothing actually fails or panics in an obvious way, which is why this is a booby trap.) |
Upon context switch, the virtual "interrupt lock" acquired by irq_lock() must not be "held". However, the current documentation for irq_lock() says that it is perfectly valid to hold it (!), and that a suspended thread will hold the "interrupt lock" upon being scheduled again (!!). Update the documentation to remove the outdated section and indicate that context switching while holding the interrupt lock is not allowed. Signed-off-by: Mathieu Choplain <mathieu.choplain@st.com>
Threads must not attempt to context switch if they are holding a spinlock. Add this information to the documentation for k_spin_lock(). Signed-off-by: Mathieu Choplain <mathieu.choplain@st.com>
a173881
a173881
Modified the |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks clear to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems perfect
The
do_swap()
routine used whenCONFIG_USE_SWITCH=y
asserts that caller thread does not hold any spinlock whenCONFIG_SPIN_VALIDATE
is enabled. However, there is no similar check in place whenCONFIG_USE_SWITCH=n
.Copy this assertion in the
USE_SWITCH=n
implementation ofz_swap_irqlock()
.